David Sontag – Research Statement

نویسنده

  • David Sontag
چکیده

In recent years, advances in science and low-cost permanent storage have resulted in the availability of massive data sets. Together with advances in machine learning, this data has the potential to lead to many new breakthroughs. For example, high-throughput genomic and proteomic experiments can be used to enable personalized medicine. Large data sets of search queries can be used to improve information retrieval. Historical climate data can be used to understand global warming and to better predict weather. However, to take full advantage of this data, we need models that are capable of explaining the data, and algorithms that can use the models to make predictions about the future. The goal of my research is to develop theory and practical algorithms for learning and probabilistic inference in very large statistical models. My research focuses on a class of statistical models called graphical models that describe multivariate probability distributions. Graphical models provide a useful abstraction for quantifying uncertainty, describing complex dependencies in data while making the model’s structure explicit so that it can be exploited by algorithms. These models have been widely applied across diverse fields such as statistical machine learning, computational biology, statistical physics, communication theory, and information retrieval. For example, consider the problem of predicting the relative orientations of a protein’s side-chains with respect to its backbone structure, a fundamental question about protein folding. Given an appropriate energy function, the prediction can be made by finding the side-chain configuration that has minimal energy. This is equivalently formulated as inference in a graphical model, where the distribution is given in terms of compatibility functions between pairs of side-chains and the backbone that take into consideration attractive and repulsive forces. Statistical models are powerful because, once estimated, they enable us to make predictions with previously unseen observations (e.g., to predict the folding of a newly discovered protein). The key obstacle to using graphical models is that exact inference is known to be NP-hard, or computationally intractable. Finding the most likely protein side-chain configuration, for example, is a difficult combinatorial optimization problem. However, many applications of graphical models have hundreds to millions of variables and long-range dependencies. To be useful, probabilistic inference needs to be fast and accurate. My doctoral research provides a new framework for approximate inference in graphical models based on linear programming relaxations. The resulting algorithms are practical and accurate, based on a solid theoretical foundation, and are likely to cement the role of linear programming relaxations as the de facto tool used for probabilistic inference. More broadly, my research highlights emerging connections between machine learning, polyhedral combinatorics, and combinatorial optimization. I am excited by both practical and theoretical problems. I frequently discuss research with students and faculty across all of EECS at MIT, colleagues at Google, and beyond. These interactions have helped me recognize key challenges in machine learning whose solutions will have a significant impact across many fields. Other research directions of mine are directly motivated by concrete practical problems. For example, my work on routing in overlay networks arose from my frustration with bad quality voice-over-IP conversations [13]. More generally, I am interested in new applications of machine learning that change the way we see and interact with the world.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Chief Complaints at Triage Time in the Emergency Department

As hospitals increasingly use electronic medical records for research and quality improvement, it is important to provide ways to structure medical data without losing either expressiveness or time. We present a system that helps achieve this goal by building an extended ontology of chief complaints and automatically predicting a patient’s chief complaint, based on their vitals and the nurses’ ...

متن کامل

Stability and Stabilization: Discontinuities and the Effect of Disturbances

In this expository paper, we deal with several questions related to stability and stabilization of nonlinear finite-dimensional continuous-time systems. We review the basic problem of feedback stabilization, placing an emphasis upon relatively new areas of research which concern stability with respect to “noise” (such as errors introduced by actuators or sensors). The topics covered are as foll...

متن کامل

A Note on Multistability and Monotone I/O Systems

Multi-stability and hysteresis are important systems properties in many applications, and particularly in biology. This paper studies such properties in the framework of monotone systems with well-defined steady-state responses. Characterizations of global stability behavior are stated in terms of easily checkable graphical conditions.

متن کامل

A Feedback Perspective for Chemostat Models with Crowding Effects

This paper deals with an almost global stability result for a chemostat model with including effects. The proof relies on a particular small-gain theorem which has recently been developed for feedback interconnections of monotone systems.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009